Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter_lua: Add chunk mode for processing multiple records #8478

Closed

Conversation

drbugfinder-work
Copy link
Contributor

@drbugfinder-work drbugfinder-work commented Feb 12, 2024

This PR will introduce a chunk_mode for lua filter. It can be needed for use cases like parallelization (see lua lanes).

Please note that the lua functions will take only two arguments:

function process_records(tag, records)
  if records and type(records) == "table" then
    for i, record_row in ipairs(records) do
        local timestamp = record_row.timestamp
        local record = record_row.record

        print("Timestamp entry:", timestamp.sec, timestamp.nsec)
        print("Record entry:", record.message)
    end
  else
    print("Error: Invalid 'records' table or nil")
  end
  return records
end

It's configuration looks like this:

[FILTER]
    Name          lua
    Match         my_logs
    script        lanes_example.lua
    call          process_records
    chunk_mode    On
    time_as_table On

The returned table must be in the same format (table of timestamp and record pairs).

This mode currently only supports time_as_table by default and does always emit the returned records. There is no return code to be set.

A use case for this can be the parallel execution of lua filters by using the lua lanes library.
Please see example here (remember to install lua lanes first e.g. apt install luarocks && luarocks install lanes and check the path in the lanes_example.lua)

Please see valgrind output:

Documentation PR:
fluent/fluent-bit-docs#1310


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

drbugfinder-work added a commit to drbugfinder-work/fluent-bit-docs that referenced this pull request Feb 12, 2024
Documentation for fluent/fluent-bit#8478

Signed-off-by: Richard Treu <richard.treu@sap.com>
drbugfinder-work added a commit to drbugfinder-work/fluent-bit-docs that referenced this pull request Feb 19, 2024
Documentation for fluent/fluent-bit#8478

Signed-off-by: Richard Treu <richard.treu@sap.com>
This commit will introduce a chunk_mode for lua filter.
It can be needed for use cases like parallelization (see lua lanes).

Please note that the lua functions will take only two arguments:

function process_records(tag, records)
  if records and type(records) == "table" then
    for i, record_row in ipairs(records) do
        local timestamp = record_row.timestamp
        local record = record_row.record

        print("Timestamp entry:", timestamp.sec, timestamp.nsec)
        print("Record entry:", record.message)
    end
  else
    print("Error: Invalid 'records' table or nil")
  end
  return records
end

The returned table must be in the same format (table of timestamp and
record pairs).

This mode currently only supports time_as_table by default and does
always emit the returned records. There is no return code to be set.

Signed-off-by: Richard Treu <richard.treu@sap.com>
@agup006
Copy link
Member

agup006 commented Feb 22, 2024

@tarruda would you take a look?

@tarruda
Copy link

tarruda commented Feb 23, 2024

@drbugfinder-work Thanks for this PR.

I skimmed quickly through the PR and lua lanes documentation in order to understand the end goal, and I assume this is how everything will work:

  • Chunked mode only difference is that it can pass multiple records at once to the lua callback.
  • Lua lanes is a library that allows you to spread processing across multiple OS threads by spawning multiple Lua states (one for each thread), running the function in each thread and then returning the results to the main thread.
  • Since the callback receives multiple records, your end goal is to split the records so that they would be processed in parallel.

(please let me know if I missed something):

With that in mind, I have some questions:

  • How many records can fluent-bit pass to the Lua filter at once?
  • Have you done any benchmarking with a big volume of records and compare with the traditional approach?
  • Will Lua lanes create a new thread every time you ask it to run a function?
  • How is the data shared between threads in lua lanes (is there serialization/deserialization to pass records across threads)?

The reason these questions are important is that this approach might have a lot of overhead and this adds quite a lot of complexity to the lua_filter implementation. Spawning threads is expensive, and only worth if you have a huge amount of data. If fluent-bit can never pass more than a few hundred records to the lua callback, I suspect splitting the workload across thread will make things even slower (this is why I suggest benchmarking).

@drbugfinder-work
Copy link
Contributor Author

drbugfinder-work commented Feb 23, 2024

Hi @tarruda,

you're absolutely right with that summary. The main goal is to parallelize record processing in Lua. However, there might be other use cases that could make use of processing more than one record at once within Lua (e.g. trend analysis).

I'm testing this in different environments, especially in K8s clusters with a couple hundreds instances of Fluent Bit. In high load scenarios, I can see chunks of up to around 300-500 records coming into the Lua filter per call. It fluctuates between 1 to 50 records per filter call under normal circumstances, depending on the log volume at the pipeline and Flush settings.
However, only having one CPU core is no longer the limit, so now I can see up to 4 cores totally being utilized in our environment per Fluent Bit instance. Of course there is some overhead by using the Lua Lanes library and creating threads, etc. but it definitely increases our maximum throughput by roughly 200-300%, as seen in our benchmarks. The limiting factor is the main Fluent Bit pipeline again and not the Lua script anymore.

Please keep in mind, that this change to the plugin does not make a direct use of the Lua Lanes library - that is just one use case. The library has to be installed by the user, in case it should be used. The implementation on how to distribute the records within the Lua into different worker threads is up to the user. I've added a simple (non worker thread limited) example to the documentation to show how to create a thread for every incoming record (this would of course create a thread on the OS for every record - which might or might not be performing well). The user could use the Linda objects of the Lanes lib to share data between threads, if this is needed by the user.

I don't think that this adds much complexity to the lua filter plugin itself, as it just passes a table of records instead of a single record to Lua - but it gives you the option to make use of all the available data in that chunk.

Copy link

@tarruda tarruda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drbugfinder-work Since you are seeing an increase in throughput and this change is backwards compatible, LGTM

Copy link
Contributor

github-actions bot commented Jun 8, 2024

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jun 8, 2024
@drbugfinder-work
Copy link
Contributor Author

Over the last months we noticed a memory leak (not detected by valgrind) by this change.
Please do not merge until this is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants